Hospital Activity and Patient Demographics

https://www.opendata.nhs.scot/dataset/inpatient-and-daycase-activity/resource/00c00ecc-b533-426e-a433-42d79bdea5d4

Activity by Board of Treatment, Age and Sex

library(tidyverse)
library(infer)
age_sex <- read_csv("../clean_data/age_and_sex.csv")

Sex

# Number of "stays" (total number of days patients spend in hospital during a continuous inpatient stay)
# Consistently Female > Male by around 50,000 per quarter
age_sex %>% 
  select(year_quarter, sex, stays) %>%
  group_by(year_quarter, sex) %>%
  summarise(stays_count = sum(stays)) %>% 
  ungroup() %>% 
  ggplot(aes(x = year_quarter, y = stays_count, colour = sex, group = sex)) +
  geom_line() +
  geom_point() +
  labs(
    title = "Stays by sex",
    x = "Quarter ending",
    y = "Stay count")
`summarise()` has grouped output by 'year_quarter'. You can override using the `.groups` argument.

# Mean length of stay
# No clear trend...maybe Female > Male on average
age_sex %>% 
  select(year_quarter, sex, average_length_of_stay) %>% 
  group_by(year_quarter, sex) %>% 
  summarise(mean_stay = mean(average_length_of_stay, na.rm = TRUE)) %>% 
  ungroup() %>%
  ggplot(aes(x = year_quarter, y = mean_stay, group = sex, colour = sex)) +
  geom_line() +
  geom_point() +
  labs(
    title = "Mean Length of Stay by Sex",
    x = "Quarter ending",
    y = "Days")
`summarise()` has grouped output by 'year_quarter'. You can override using the `.groups` argument.

Question 1: To what extent are the ‘winter crises’ reported by the media real?

age_sex %>% 
  group_by(winter) %>% 
  summarise(mean_stays = mean(stays, na.rm = TRUE),
            mean_length_of_stay = mean(average_length_of_stay, na.rm = TRUE))

Doesn’t look like a significant difference.

HYPOTHESIS TESTING

h0: the number of stays (both sexes total) is not significantly greater in winter quarters relative to non-winter quarters h1: the number of stays is greater in winter quarters

# Boxplot
age_sex %>% 
  ggplot(aes(x = stays, y = winter, fill = winter)) +
  geom_boxplot()


# Set up null distriburtion
null_dist_winter <- age_sex %>% 
  specify(stays ~ winter) %>% 
  hypothesise(null = "independence") %>% 
  generate(reps = 1000, type = "permute") %>% 
  calculate(stat = "diff in means", order = c(T, F))

# Calculate the observed stat
obs_stat_winter <- age_sex %>% 
  specify(stays ~ winter) %>% 
  calculate(stat = "diff in means", order = c(T, F))
            
# Visualise confidence interval 
null_dist_winter %>% 
  visualise() +
  shade_p_value(obs_stat = obs_stat_winter, direction = "right")


# Find p-value
pvalue_winter <- null_dist_winter %>% 
  get_p_value(obs_stat = obs_stat_winter, direction = "right")
pvalue_winter
NA

With a p-value around 0.12, we fail to reject the null hypothesis. The number of hospital stays is, on average, greater in winter, but not significantly so.

Question 2: How has the Covid-19 pandemic affected provision of acute care in Scotland?

Overall

age_sex %>% 
  group_by(post_covid) %>% 
  summarise(mean_stays = mean(stays, na.rm = TRUE),
            mean_length_of_stay = mean(average_length_of_stay, na.rm = TRUE))

Drop in the number of stays….significant? Increase in length of stay….significant?

Hypthesis Test h0: There is no significant decrease post-covid in the overall number of stays in hospital. h1: There are significantly fewer stays in hospital post-covid.

age_sex %>% 
  ggplot(aes(x = stays, y = post_covid, fill = post_covid)) +
  geom_boxplot()


null_distribution <- age_sex %>%
  specify(response = stays, explanatory = post_covid) %>% 
  hypothesize(null = "independence") %>%
  generate(reps = 100, type = "permute") %>%
  calculate(stat = "diff in means", order = c(TRUE, FALSE))

observed_stat <- age_sex %>% 
  specify(stays ~ post_covid) %>% 
  calculate(stat= "diff in means", order = c(TRUE, FALSE))

observed_stat
Response: stays (numeric)
Explanatory: post_covid (factor)
null_distribution %>% 
  visualise(bins = 30) +
  shade_p_value(obs_stat = observed_stat, direction = "left")


null_distribution %>% 
  get_p_value(obs_stat = observed_stat, direction = "left")
Warning: Please be cautious in reporting a p-value of 0. This result is an approximation based on the number of `reps` chosen in the `generate()` step. See `?get_p_value()` for more information.

Reject null hypothesis. There are significantly fewer stays in hospital post-covid. Why?

Post-covid changes of different sexes

age_sex %>% 
  group_by(sex, post_covid) %>% 
  summarise(mean_stays = mean(stays, na.rm = TRUE),
            mean_length_of_stay = mean(average_length_of_stay, na.rm = TRUE))
`summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.

Post-covid, the number of stays in both groups has decreased by a lot. Has one sex been affected more than the other? Post-covid, the length of stays in both groups has increased by a small amount. Has one sex been affected more than the other?

# Wrangling data set
male_stays <- age_sex %>% 
  filter(sex == "Male") %>% 
  rename(male_stays = stays,
         average_length_of_male_stay = average_length_of_stay) %>% 
  mutate(id = str_c(year_quarter, hb, location, admission_type, age)) %>% 
  select(id, male_stays, average_length_of_male_stay, winter, post_covid)

female_stays <- age_sex %>% 
  filter(sex == "Female") %>% 
  rename(female_stays = stays,
         average_length_of_female_stay = average_length_of_stay) %>% 
  mutate(id = str_c(year_quarter, hb, location, admission_type, age)) %>% 
  select(id, female_stays, average_length_of_female_stay, winter, post_covid)

age_sex_stay_deltas <- full_join(male_stays, female_stays, by = "id") %>%
  mutate(female_stays = case_when(female_stays > 0 ~ female_stays,
                                  .default = 0),
         average_length_of_female_stay =
           case_when(average_length_of_female_stay > 0 ~ average_length_of_female_stay,
                     .default = 0),
         male_stays = case_when(male_stays > 0 ~ male_stays,
                                .default = 0),
         average_length_of_male_stay =
           case_when(average_length_of_male_stay > 0 ~ average_length_of_male_stay,
                     .default = 0)) %>% 
  rename(winter = winter.x,
         post_covid = post_covid.x) %>% 
  mutate(winter = case_when(winter == TRUE ~ TRUE,
                            winter == FALSE ~ FALSE,
                            .default = winter.y)) %>%
  mutate(post_covid = case_when(post_covid == TRUE ~ TRUE,
                                post_covid == FALSE ~ FALSE,
                                .default = post_covid.y)) %>%
  mutate(more_female_stays = female_stays - male_stays,
         females_stay_longer = average_length_of_female_stay - average_length_of_male_stay) %>% 
  select(more_female_stays, females_stay_longer, winter, post_covid)

Number of hospital stays by sex pre/post covid

Hypothesis test: h0: There is no significant change post-covid in the difference between the number of male and female stays in hospital. h1: There is a significant change post-covid in the difference between the number of male and female stays in hospital.

age_sex_stay_deltas %>% 
  ggplot(aes(x = more_female_stays, y = post_covid, fill = post_covid)) +
  geom_boxplot()


null_distribution <- age_sex_stay_deltas %>%
  specify(response = more_female_stays, explanatory = post_covid) %>% 
  hypothesize(null = "independence") %>%
  generate(reps = 1000, type = "permute") %>%
  calculate(stat = "diff in means", order = c(TRUE, FALSE))

observed_stat <- age_sex_stay_deltas %>% 
  specify(more_female_stays ~ post_covid) %>% 
  calculate(stat= "diff in means", order = c(TRUE, FALSE))

observed_stat
Response: more_female_stays (numeric)
Explanatory: post_covid (factor)
null_distribution %>% 
  visualise(bins = 30) +
  shade_p_value(obs_stat = observed_stat, direction = "both")


null_distribution %>% 
  get_p_value(obs_stat = observed_stat, direction = "both")

Reject null hypothesis There is a significant change post-covid in the difference between the number of male and female stays in hospital. Male stays are lower and the gap has narrowed, so men have been affected more negatively.

Length of hospital stays by sex pre/post covid

Hypothesis test 2: h0: There is no significant change post-covid in the difference between the length of stays of men and women staying in hospital. h1: There is a significant change post-covid in the difference between the length of stays of men and women staying in hospital.

age_sex_stay_deltas %>% 
  ggplot(aes(x = females_stay_longer, y = post_covid, fill = post_covid)) +
  geom_boxplot()


null_distribution <- age_sex_stay_deltas %>%
  specify(response = females_stay_longer, explanatory = post_covid) %>% 
  hypothesize(null = "independence") %>%
  generate(reps = 1000, type = "permute") %>%
  calculate(stat = "diff in means", order = c(TRUE, FALSE))

observed_stat <- age_sex_stay_deltas %>% 
  specify(females_stay_longer ~ post_covid) %>% 
  calculate(stat= "diff in means", order = c(TRUE, FALSE))

observed_stat
Response: females_stay_longer (numeric)
Explanatory: post_covid (factor)
null_distribution %>% 
  visualise(bins = 30) +
  shade_p_value(obs_stat = observed_stat, direction = "both")


null_distribution %>% 
  get_p_value(obs_stat = observed_stat, direction = "both")

Fail to reject null hyposthesis There is no significant change post-covid in the difference between the length of stays of men and women staying in hospital.

---
title: "R Notebook"
output: html_notebook
---

# Hospital Activity and Patient Demographics

https://www.opendata.nhs.scot/dataset/inpatient-and-daycase-activity/resource/00c00ecc-b533-426e-a433-42d79bdea5d4

# Activity by Board of Treatment, Age and Sex
```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(infer)
```

```{r message=FALSE, warning=FALSE}
age_sex <- read_csv("../clean_data/age_and_sex.csv")
```

```{r eval=FALSE, include=FALSE}
glimpse(age_sex)

str(age_sex)

skimr::skim(age_sex)
```

# Sex
```{r}
# Number of "stays" (total number of days patients spend in hospital during a continuous inpatient stay)
# Consistently Female > Male by around 50,000 per quarter
age_sex %>% 
  select(year_quarter, sex, stays) %>%
  group_by(year_quarter, sex) %>%
  summarise(stays_count = sum(stays)) %>% 
  ungroup() %>% 
  ggplot(aes(x = year_quarter, y = stays_count, colour = sex, group = sex)) +
  geom_line() +
  geom_point() +
  labs(
    title = "Stays by sex",
    x = "Quarter ending",
    y = "Stay count")


# Mean length of stay
# No clear trend...maybe Female > Male on average
# Range is small
age_sex %>% 
  select(year_quarter, sex, average_length_of_stay) %>% 
  group_by(year_quarter, sex) %>% 
  summarise(mean_stay = mean(average_length_of_stay, na.rm = TRUE)) %>% 
  ungroup() %>%
  ggplot(aes(x = year_quarter, y = mean_stay, group = sex, colour = sex)) +
  geom_line() +
  geom_point() +
  labs(
    title = "Mean Length of Stay by Sex",
    x = "Quarter ending",
    y = "Days")
```

# Question 1: To what extent are the ‘winter crises’ reported by the media real?

```{r}
age_sex %>% 
  group_by(winter) %>% 
  summarise(mean_stays = mean(stays, na.rm = TRUE),
            mean_length_of_stay = mean(average_length_of_stay, na.rm = TRUE))
```
Doesn't look like a significant difference.

### HYPOTHESIS TESTING

h0: the number of stays (both sexes total) is not significantly greater in winter quarters relative to non-winter quarters
h1: the number of stays is greater in winter quarters

```{r}
# Boxplot
age_sex %>% 
  ggplot(aes(x = stays, y = winter, fill = winter)) +
  geom_boxplot()

# Set up null distriburtion
null_dist_winter <- age_sex %>% 
  specify(stays ~ winter) %>% 
  hypothesise(null = "independence") %>% 
  generate(reps = 1000, type = "permute") %>% 
  calculate(stat = "diff in means", order = c(T, F))

# Calculate the observed stat
obs_stat_winter <- age_sex %>% 
  specify(stays ~ winter) %>% 
  calculate(stat = "diff in means", order = c(T, F))
            
# Visualise confidence interval 
null_dist_winter %>% 
  visualise() +
  shade_p_value(obs_stat = obs_stat_winter, direction = "right")

# Find p-value
pvalue_winter <- null_dist_winter %>% 
  get_p_value(obs_stat = obs_stat_winter, direction = "right")
pvalue_winter

```
With a p-value around 0.12, we fail to reject the null hypothesis.
The number of hospital stays is, on average, greater in winter, but not significantly so.


# Question 2: How has the Covid-19 pandemic affected provision of acute care in Scotland?
- Demographic: Who is most affected by this issue? (Who should be targeted with efforts?)

## Overall
```{r}
age_sex %>% 
  group_by(post_covid) %>% 
  summarise(mean_stays = mean(stays, na.rm = TRUE),
            mean_length_of_stay = mean(average_length_of_stay, na.rm = TRUE))
```
Drop in the number of stays....significant?
Increase in length of stay....significant?

*Hypthesis Test*
h0: There is no significant decrease post-covid in the overall number of stays in hospital.
h1: There are significantly fewer stays in hospital post-covid.
```{r}
age_sex %>% 
  ggplot(aes(x = stays, y = post_covid, fill = post_covid)) +
  geom_boxplot()

null_distribution <- age_sex %>%
  specify(response = stays, explanatory = post_covid) %>% 
  hypothesize(null = "independence") %>%
  generate(reps = 100, type = "permute") %>%
  calculate(stat = "diff in means", order = c(TRUE, FALSE))

observed_stat <- age_sex %>% 
  specify(stays ~ post_covid) %>% 
  calculate(stat= "diff in means", order = c(TRUE, FALSE))

observed_stat

null_distribution %>% 
  visualise(bins = 30) +
  shade_p_value(obs_stat = observed_stat, direction = "left")

null_distribution %>% 
  get_p_value(obs_stat = observed_stat, direction = "left")
```
*Reject null hypothesis.*
*There are significantly fewer stays in hospital post-covid.*  Why?


## Post-covid changes of different sexes
```{r}
age_sex %>% 
  group_by(sex, post_covid) %>% 
  summarise(mean_stays = mean(stays, na.rm = TRUE),
            mean_length_of_stay = mean(average_length_of_stay, na.rm = TRUE))
```
Post-covid, the number of stays in both groups has decreased by a lot. Has one sex been affected more than the other?
Post-covid, the length of stays in both groups has increased by a small amount.  Has one sex been affected more than the other?

```{r}
# Wrangling data set
male_stays <- age_sex %>% 
  filter(sex == "Male") %>% 
  rename(male_stays = stays,
         average_length_of_male_stay = average_length_of_stay) %>% 
  mutate(id = str_c(year_quarter, hb, location, admission_type, age)) %>% 
  select(id, male_stays, average_length_of_male_stay, winter, post_covid)

female_stays <- age_sex %>% 
  filter(sex == "Female") %>% 
  rename(female_stays = stays,
         average_length_of_female_stay = average_length_of_stay) %>% 
  mutate(id = str_c(year_quarter, hb, location, admission_type, age)) %>% 
  select(id, female_stays, average_length_of_female_stay, winter, post_covid)

age_sex_stay_deltas <- full_join(male_stays, female_stays, by = "id") %>%
  mutate(female_stays = case_when(female_stays > 0 ~ female_stays,
                                  .default = 0),
         average_length_of_female_stay =
           case_when(average_length_of_female_stay > 0 ~ average_length_of_female_stay,
                     .default = 0),
         male_stays = case_when(male_stays > 0 ~ male_stays,
                                .default = 0),
         average_length_of_male_stay =
           case_when(average_length_of_male_stay > 0 ~ average_length_of_male_stay,
                     .default = 0)) %>% 
  rename(winter = winter.x,
         post_covid = post_covid.x) %>% 
  mutate(winter = case_when(winter == TRUE ~ TRUE,
                            winter == FALSE ~ FALSE,
                            .default = winter.y)) %>%
  mutate(post_covid = case_when(post_covid == TRUE ~ TRUE,
                                post_covid == FALSE ~ FALSE,
                                .default = post_covid.y)) %>%
  mutate(more_female_stays = female_stays - male_stays,
         females_stay_longer = average_length_of_female_stay - average_length_of_male_stay) %>% 
  select(more_female_stays, females_stay_longer, winter, post_covid)
```


### Number of hospital stays by sex pre/post covid
Hypothesis test:
h0: There is no significant change post-covid in the difference between the number of male and female stays in hospital.
h1: There is a significant change post-covid in the difference between the number of male and female stays in hospital.
```{r}
age_sex_stay_deltas %>% 
  ggplot(aes(x = more_female_stays, y = post_covid, fill = post_covid)) +
  geom_boxplot()

null_distribution <- age_sex_stay_deltas %>%
  specify(response = more_female_stays, explanatory = post_covid) %>% 
  hypothesize(null = "independence") %>%
  generate(reps = 1000, type = "permute") %>%
  calculate(stat = "diff in means", order = c(TRUE, FALSE))

observed_stat <- age_sex_stay_deltas %>% 
  specify(more_female_stays ~ post_covid) %>% 
  calculate(stat= "diff in means", order = c(TRUE, FALSE))

observed_stat

null_distribution %>% 
  visualise(bins = 30) +
  shade_p_value(obs_stat = observed_stat, direction = "both")

null_distribution %>% 
  get_p_value(obs_stat = observed_stat, direction = "both")
```
*Reject null hypothesis*
*There is a significant change post-covid in the difference between the number of male and female stays in hospital.*
*Male stays are lower and the gap has narrowed, so men have been affected more negatively.*

### Length of hospital stays by sex pre/post covid
Hypothesis test 2:
h0: There is no significant change post-covid in the difference between the length of stays of men and women staying in hospital.
h1: There is a significant change post-covid in the difference between the length of stays of men and women staying in hospital.

```{r}
age_sex_stay_deltas %>% 
  ggplot(aes(x = females_stay_longer, y = post_covid, fill = post_covid)) +
  geom_boxplot()

null_distribution <- age_sex_stay_deltas %>%
  specify(response = females_stay_longer, explanatory = post_covid) %>% 
  hypothesize(null = "independence") %>%
  generate(reps = 1000, type = "permute") %>%
  calculate(stat = "diff in means", order = c(TRUE, FALSE))

observed_stat <- age_sex_stay_deltas %>% 
  specify(females_stay_longer ~ post_covid) %>% 
  calculate(stat= "diff in means", order = c(TRUE, FALSE))

observed_stat

null_distribution %>% 
  visualise(bins = 30) +
  shade_p_value(obs_stat = observed_stat, direction = "both")

null_distribution %>% 
  get_p_value(obs_stat = observed_stat, direction = "both")
```
*Fail to reject null hyposthesis*
*There is no significant change post-covid in the difference between the length of stays of men and women staying in hospital.*






